AITopics | frequency word

Collaborating Authors

frequency word

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Quality Estimation of Machine Translated Texts based on Direct Evidence from Training Data

Kumari, Vibhuti, Kavi, Narayana Murthy

arXiv.org Artificial IntelligenceJun-27-2023

Current Machine Translation systems achieve very good results on a growing variety of language pairs and data sets. However, it is now well known that they produce fluent translation outputs that often can contain important meaning errors. Quality Estimation task deals with the estimation of quality of translations produced by a Machine Translation system without depending on Reference Translations. A number of approaches have been suggested over the years. In this paper we show that the parallel corpus used as training data for training the MT system holds direct clues for estimating the quality of translations produced by the MT system. Our experiments show that this simple and direct method holds promise for quality estimation of translations produced by any purely data driven machine translation system.

machine translation, training data, translation, (14 more...)

arXiv.org Artificial Intelligence

2306.15399

Country:

North America > United States > Maryland > Baltimore (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Frequency effects in Linear Discriminative Learning

Heitmeier, Maria, Chuang, Yu-Ying, Axen, Seth D., Baayen, R. Harald

arXiv.org Artificial IntelligenceJun-19-2023

Word frequency is a strong predictor in most lexical processing tasks. Thus, any model of word recognition needs to account for how word frequency effects arise. The Discriminative Lexicon Model (DLM; Baayen et al., 2018a, 2019) models lexical processing with linear mappings between words' forms and their meanings. So far, the mappings can either be obtained incrementally via error-driven learning, a computationally expensive process able to capture frequency effects, or in an efficient, but frequency-agnostic closed-form solution modelling the theoretical endstate of learning (EL) where all words are learned optimally. In this study we show how an efficient, yet frequency-informed mapping between form and meaning can be obtained (Frequency-informed learning; FIL). We find that FIL well approximates an incremental solution while being computationally much cheaper. FIL shows a relatively low type- and high token-accuracy, demonstrating that the model is able to process most word tokens encountered by speakers in daily life correctly. We use FIL to model reaction times in the Dutch Lexicon Project (Keuleers et al., 2010) and find that FIL predicts well the S-shaped relationship between frequency and the mean of reaction times but underestimates the variance of reaction times for low frequency words. FIL is also better able to account for priming effects in an auditory lexical decision task in Mandarin Chinese (Lee, 2007), compared to EL. Finally, we used ordered data from CHILDES (Brown, 1973; Demuth et al., 2006) to compare mappings obtained with FIL and incremental learning. The mappings are highly correlated, but with FIL some nuances based on word ordering effects are lost. Our results show how frequency effects in a learning model can be simulated efficiently by means of a closed-form solution, and raise questions about how to best account for low-frequency words in cognitive models.

frequency, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2306.11044

Country:

Europe > Austria > Vienna (0.14)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
Africa > Kenya > Mandera County > Mandera (0.04)
(10 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

The Undesirable Dependence on Frequency of Gender Bias Metrics Based on Word Embeddings

Valentini, Francisco, Rosati, Germán, Slezak, Diego Fernandez, Altszyler, Edgar

arXiv.org Artificial IntelligenceJan-2-2023

Numerous works use word embedding-based metrics to quantify societal biases and stereotypes in texts. Recent studies have found that word embeddings can capture semantic similarity but may be affected by word frequency. In this work we study the effect of frequency when measuring female vs. male gender bias with word embedding-based bias quantification methods. We find that Skip-gram with negative sampling and GloVe tend to detect male bias in high frequency words, while GloVe tends to return female bias in low frequency words. We show these behaviors still exist when words are randomly shuffled. This proves that the frequency-based effect observed in unshuffled corpora stems from properties of the metric rather than from word associations. The effect is spurious and problematic since bias metrics should depend exclusively on word co-occurrences and not individual word frequencies. Finally, we compare these results with the ones obtained with an alternative metric based on Pointwise Mutual Information. We find that this metric does not show a clear dependence on frequency, even though it is slightly skewed towards male bias across all frequencies.

artificial intelligence, frequency, natural language, (16 more...)

arXiv.org Artificial Intelligence

2301.00792

Country:

North America > United States (0.14)
South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.05)
Europe > Portugal > Lisbon > Lisbon (0.04)
Europe > Middle East > Malta > Port Region > Southern Harbour District > Valletta (0.04)

Genre: Research Report > New Finding (0.30)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.48)

Add feedback

Introduction to Automatic Text Summarization

#artificialintelligenceJan-6-2017, 04:15:05 GMT

Sifting through lots of documents can be difficult and time consuming. Without an abstract or summary, it can take minutes just to figure out what the heck someone is talking about in a paper or report. And, if you need to get through hundreds of documents – good luck. Summarizer is an algorithm that extracts sentences from a text document, determines which are most important, and returns them in a readable and structured way. Automatic text summarization is part of the field of natural language processing, which is how computers can analyze, understand, and derive meaning from human language.

artificial intelligence, natural language, text processing, (7 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.34)

Add feedback

Comparison Between Global Vs Local Normalization of Tweets, and Various Distances

@machinelearnbotDec-18-2016, 02:05:04 GMT

From the text mining literature, it appears that practitioners tend to utilize Cosine Distance to compare 2 documents. They have used it with great success. From our previous blog, we also used Cosine Distance and we also found it extremely good and helping us, and our clustering method, get an insight in the UK Exit Referendum. In here, we decided to change our initial conditions and see if we get different outcomes,i.e. We decided to try 4 others distances: Jaccard, Matching, Rogers Tanimoto and Euclidean.

artificial intelligence, machine learning, normalization, (15 more...)

@machinelearnbot

Country: Europe > United Kingdom (0.38)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.38)

Add feedback

Comparison Between Global Vs Local Normalization of Tweets, and Various Distances

@machinelearnbotNov-7-2016, 15:15:03 GMT

In the previous example we used clustering to see if an apparent pattern exists within Brexit tweets. We found out that we have three distinct patterns, the leave, the referendum, and Brexit. This in itself helps us think that we may even create a classifier that can identify if the tweet writer is pro or agains an issue automatically, with no human intervention. Let's get back to the issues related to clustering. To use the clustering algorithm we had to map 2 tweets at the time to a binary vector.

artificial intelligence, global vs local normalization, machine learning, (8 more...)

@machinelearnbot

Country: Europe > United Kingdom (0.58)

Industry: Government (0.91)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Human Reading and the Curse of Dimensionality

Martin, Gale

Neural Information Processing SystemsDec-31-1996

Whereas optical character recognition (OCR) systems learn to classify single characters; people learn to classify long character strings in parallel, within a single fixation. This difference is surprising because high dimensionality is associated with poor classification learning. This paper suggests that the human reading system avoids these problems because the number of to-be-classified images is reduced by consistent and optimal eye fixation positions, and by character sequence regularities. An interesting difference exists between human reading and optical character recognition (OCR) systems. The input/output dimensionality of character classification in human reading is much greater than that for OCR systems (see Figure 1). OCR systems classify one character at time; while the human reading system classifies as many as 8-13 characters per eye fixation (Rayner, 1979) and within a fixation, character category and sequence information is extracted in parallel (Blanchard, McConkie, Zola, and Wolverton, 1984; Reicher, 1969).

classification, dimensionality, frequency word, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Kansas (0.06)
North America > United States > Texas > Travis County > Austin (0.04)

Genre: Research Report (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Human Reading and the Curse of Dimensionality

Martin, Gale

Neural Information Processing SystemsDec-31-1996

classification, dimensionality, frequency word, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Kansas (0.06)
North America > United States > Texas > Travis County > Austin (0.04)

Genre: Research Report (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Human Reading and the Curse of Dimensionality

Martin, Gale

Neural Information Processing SystemsDec-31-1996

Whereas optical character recognition (OCR) systems learn to classify singlecharacters; people learn to classify long character strings in parallel, within a single fixation. This difference is surprising because high dimensionality is associated with poor classification learning. This paper suggests that the human reading system avoids these problems because the number of to-be-classified images isreduced by consistent and optimal eye fixation positions, and by character sequence regularities. An interesting difference exists between human reading and optical character recognition (OCR)systems. The input/output dimensionality of character classification in human reading is much greater than that for OCR systems (see Figure 1) . OCR systems classify one character at time; while the human reading system classifies as many as 8-13 characters per eye fixation (Rayner, 1979) and within a fixation, character category and sequence information is extracted in parallel (Blanchard, McConkie, Zola, and Wolverton, 1984; Reicher, 1969).

artificial intelligence, machine learning, optical character recognition, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.16)

Genre: Research Report (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback